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1 Summary 

In this document, we propose a couple of cleanup changes to the specification of the CABAC 
entropy coding part in the current draft [1J. Although the proposed changes are very minor in 
terms of changes to the draft text the resulting improvements with respect to complexity and 
coding efficiency are quite significant. 

2 Proposed Technical Changes 

2.1 Coding of Intra Prediction Modes for4x4Luma Blocks 

In the current draft, coding of Intra 4x4 and Slntra 4x4 prediction modes for a given block uses 
18 different context models depending on the prediction mode chosen for the neighboring 4x4 
block to the left. In JVT-D025 [2], a proposal was made to simplify context modeling in that 
case. More specifically, it was proposed to use only two different context models. The first 
model should be used to encode the binary-valued parameter usejnost _probable_mode, which 
indicates the usage of the most probable mode for a given block with respect to the chosen 
modes of the neighboring blocks. For encoding of the parameter remaining_mode_selector that, 
indicates the chosen prediction mode in the case the most probable mode is not chosen a 
second model is proposed to be used in conjunction with a truncated unary binarization. It was 
shown that these simplifications did not hurt the coding efficiency for the common test set [2]. 

We re-examined the proposed simplifications of JVT-D025 and found that a further modification 
is useful. In contrast to JVT-D025 we have chosen a fixed-length binarization with a length of 3 
bits to encode the remainingmodeselector. Since for the non-CABAC case, a 3 bit fixed : 
length codeword is transmitted for the remaining_mode_selector, a better harmonization with 
the CAVLC/UVLC entropy coding mode is achieved. 

To evaluate the performance of our proposed simplification in conjunction with JVT-D025, a set 
of experiments has been conducted using JM 4.0d as a common basis. The results of these 
simulations for the common test set artd the interlaced test set are summarized in Tables 1 and 
2, respectively. It turned out that in comparison to the method currently specified in the FCD an 
average bit rate reduction of approximately 0.4% could be achieved in all Observed cases. 



Table 1: Average Bjontegaard Delta bit rate savings in % for the QP-range = 28, 32, 36, 40 for the 
common test set (QCIF/CIF) using the first intra frame only for a performance comparison of the proposed 
changes to the FCD (positive values reflect a bit rate reduction). 
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Table 2: Average Bjontegaard Delta bit rate savings in % for the QP-range = 28, 32, 36, 40 fqr the 
interlaced test set using the first intra frame only for a performance comparison of the proposed changes 
to the FCD (positive values reflect a bit rate reduction). 
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2.2 Simplified Coding of intra Prediction Modes for Chroma Blocks 

A rather minor change is proposed to simplify coding of intra prediction modes related to 
chroma blocks. Here we propose to reduce the number Of different context models from 5 to 4. 
To be more specific, we propose to set max_idx_ctx_id' > =°2 instead of max jdx_ctx_id°=°3, as 
specified in the current draft. By doing so, no negative impact on coding efficiency has been 
observed. 

2.3 Changes for MB- Adaptive Frame/Field Coding 

For a frame picture in MB-level adaptive coding of interlaced sources, the selection of frame or 
field coding is done at MB level. Thus, a binary-valued frame/field flag mb_field^_decoding_flag 
is required to indicate if the 2 MBs of a MB pair shall be decoded in frame or field mode. 

For coding of mb_field_decoding_flag, we propose the following context model. Let A and B 
denote the spatially neighboring MB pairs to the left and on top of the current MB pair X, 
respectively, then we choose the conditioning term ctx_mb_field (X) according to the template 
ctx_var_ spat1 as specified in [1]: 

ctx_ m b _ f i e I d ( X ) = 

((mb_ field_decodingJlag (A) != 0)7 1: 0) + ((mb_ field_decoding_flag (B) ! = 0) 7 1: 0) 
This results in 3 additional context models for coding of mb_field_decodingjflag. 

Coding of all other syntax elements is done in exactly the same way as specified in the FCD 
Only the semantics of a neighboring macroblock used for conditional coding Of the mb_skip_flag 
has to be specified more precisely for the case of MB-adaptive frame/field coding. Since the 
decoder may not know the specific choice of frame or field mode when decoding the 
mb_skip_flag, it is always assumed that co-location of macroblocks for context determination 
has to be interpreted as co-location in the sense of pure frame coding. That means, for 
instance, that the macroblock B on top of the current skipped macroblock X in a (potential) top 
field position is always chosen as the bottom field MB of the MB pair above the current MB X. 

2.4 Changes for Improving the Arithmetic Coding Engine 

2.4.1 Modified Renormalization Bound and Initialization of Range 

In the current FCD [1], renormalization in the arithmetic coding engine is performed such that 
the range R of the coding interval stays within [Q+1, Q+2, ... , 2*QJ, where Q = 0x4000. This 
convention has the drawback that the underlying renormalization condition R < Q, which has to 
be tested each time the range R has been modified, cannot be implemented by using simple bit 



.operations. In addition, for specifying the LPS related sub-interval range R LPS from the given 
range R a further subtraction is needed: R LPS = RTAB[state][(R-0x4001 )»12]. 



To avoid these operations, we propose to modify the renbrmalization bound such that 
R'e °[Q, °Q+1° ... ; °2*Q-J. This implies that the comparison for renormalization is exchanged by 
the test of the condition R°<°Q (which can be realized by a simple bit test), and that the. 
computation of the LPS related range R LPS is now given by R,j> s = RTAB[stateJ[(/?»12) & 0x03]. 
At the same time, however, we have to change the initial value of R from 2*Q, as currently 
specified in the draft, to the value of 2*Q-2. The reason behind not choosing 2*CM as the initial 
value is given by the fact that for a proper equi-sized sub-interval partitioning in the case of a 
decoder bypass for uniform pdfs (as specified in subclause 9.2.4.2.5 of [1]) we have to maintain 
an even-valued range. 

2.4.2 Simplified Decoder Bypass 

As currently specified in the draft, the minimum precision for representing the range and value 
register at the decoder is given by 16 bits. In fact, as part of this proposal (see section 2.4.3) we 
will show that this precision can be further reduced by means of a simple change in the 
ihitiajization part of the coding engine. 

Under the assumption that one bit more of precision than the minimum number of bits is 
available for representing the R (range) register and V (value) register at the decoder, a further 
simplification for the decoder bypass is possible. Instead of computing R h aif= RI2 in the first step 
as specified in subclause 9.2.4.2.5 of [1], we propose to first double the value register V, i.e., 
V< This way the computation of R,,,,. = R/2 can be omitted, and each of the subsequent 

steps involving R half can be replaced by the corresponding operation in which R half is replaced 
by R, as depicted in Figure 1. ' '' :;■ . ' '". ' "' '" ■'. 




V = V|«B»BG)&1) 



I 



Done ^ 



Figure 1: Flowchart of simplified decoder bypass 



2.4.3 12-bit Implementation of the Coding Engine 

As specified in subclause 9.2.4.2.2 of the current draft [1], the LPS related range values R tPs are 
actually given in 8-bit precision. For a convenient access in a 16-bit architecture, however, the 
corresponding table entries of RTAJB are given as multiples of 64. Thus, under the condition* that 
the range R has to be represented by an even value, there are 5 LSB bits that are unused in a 
16-bit implementation. By further changing the initial value of R from the value of 2*Q-2 as 
proposed in section 2-4,1 to the value of 2*Q-2*32, we finally arrive at a minimum precision of 
11 bits both for the R and the V register by simply downshifting (right shifting) the 5 unused LSB 
bits. By taking into account the additional bit of precision required for the simplified decoder 
bypass (see section 2.4.2), we finally arrive at a minimum number of 12 bits of precision for the 
registers R and V representing the arithmetic decoding engine. 

2.4.4 Speedup of Renormalization (non-normative) 

In case of .having available a higher precision for representing R and V, for example, in a 
software-based implementation with a 32-bit integer representation, the following trick can be 
used to speed up the renormalization loop. 

According to the specification given in [1], a bit is inserted in the least significant bit of V each 
time a renormalization shift occurs, as is shown in the last block of the flowchart in Fig. 1. By left 
shifting V, R, all RTAB entries, all related bounds and initial values as well by 7 bits, i.e., by 
multiplying these registers by 2 7 , which requires a minimum register precision of 19 bits 
according to the changes proposed in the previous section, a full byte can be inserted in the 8 
least significant bits of the V register for each call of the GetByte procedure. In fact, by using an 
even higher precision of 27 bits, which corresponds to a left shift of 15 bits, the byte-wise 
insertion can be further replaced by a word-wise insertion. This way the renormalization loop is 
reduced to essentially three simple operations, which can be done in parallel: two shift 
operations (for the registers R and V) and one counter decrement. 

3 Verification of JVT-D01 9 (Constrained Arithmetic Coding) 

In this section we provide some experimental results for verification of the constrained arithmetic 
coding as presented in JVT-D01 9 [3]. For the previously specified constrained arithmetic coding 
in the CD [4], a quite significant loss in coding efficiency of up to 7% overall bit rate increase 
was reported [3]. We re-examined the new constrained arithmetic coding algorithm with respect 
to the specific low bit-rate coding scenarios investigated in [3]. For this experiment we used the 
test model, version JM4.0d, which already includes the constrained arithmetic coding as 
proposed in JVT-D019. We compared two variants of the encoder: the original version using the 
method of constrained coding (triggered by switching the pre-processor flag— — 
NEW_GONSTRAINT_AC on), and the second version, where the constrained arithmetic coding 
has been disabled (by commenting out the pre-processor flag NEW_CONSTRAINT_AC). The 
results of these experiments, which can be found in the accompanying Excel-file JVT-E059.xls . 
clearly show that there is no difference in coding efficiency between these both versions for the 
observed cases. 
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